Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

نویسندگان

چکیده

This paper studies the adversarial graphical contextual bandits, a variant of multi-armed bandits that leverage two categories most common side information: contexts and observations. In this setting, learning agent repeatedly chooses from set K actions after being presented with d-dimensional context vector. The not only incurs observes loss chosen action, but also losses its neighboring in observation structures, which are encoded as series feedback graphs. setting models variety applications social networks, where both graph-structured observations available. Two efficient algorithms developed based on EXP3. Under mild conditions, our analysis shows for undirected graphs first algorithm, EXP3-LGC-U, achieves sub-linear regret respect to time horizon average independence number A slightly weaker result is directed graph well. second EXP3-LGC-IX, special class problems, same well Numerical tests corroborate efficiency proposed algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Contextual Bandits with Knapsacks

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn’t exceed the budget for each ...

متن کامل

Risk-Aware Algorithms for Adversarial Contextual Bandits

In this work we consider adversarial contextual bandits with risk constraints. At each round, nature prepares a context, a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and then receives the corresponding cost and risk associated with the pulled arm. In addition to minimizing the cumulative cost, the learner also needs to satisfy long-...

متن کامل

Conservative Contextual Linear Bandits

Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields includin...

متن کامل

Contextual Bandits with Linear Payoff Functions

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O (√ Td ln(KT ln(T )/δ) ) regret bound that holds with probability 1− δ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm...

متن کامل

Structured Stochastic Linear Bandits

The stochastic linear bandit problem proceeds in rounds where at each round the algorithm selects a vector from a decision set after which it receives a noisy linear loss parameterized by an unknown vector. The goal in such a problem is to minimize the (pseudo) regret which is the difference between the total expected loss of the algorithm and the total expected loss of the best fixed vector in...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i11.17218